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Abstract — The use of Kalman filter (KF) interferes with 
fault detection algorithms based on the residual between 
estimated and measured variables, since the measured values 
are used to update the estimates. This feedback results in 
the estimates being pulled closer to the measured values, 
influencing the residuals in the process. Here we present a 
fault detection scheme for systems that are being tracked by 
a KF. Our approach combines an open- loop prediction over 
an adaptive window and an information- based measure of 
the deviation of the Kalman estimate from the prediction to 
improve fault detection. 

I. Introduction 

A. Kalman Filter 

Consider a discrete-time controlled process that is gov- 
erned by a linear stochastic difference equation (1) and a 
measurement (2): 

x(ti) = Ax(U- 1 ) + Bu(ti) + w(U) (1) 

z(ti) = Hx(ti) 4- v(ti) (2) 

w(ti), v(ti) represent the process and measurement noise 
respectively and are assumed to be independent, white and 
Gaussian with probability distributions Af(0, <5), jV’(C), R) 
respectively. Given the noise in the process and measure- 
ments, the KF [1] computes an unbiased estimate x of the 
state x by providing an optimal solution of the least-squares 
method. This is achieved by recursively minimizing the a 
posteriori estimate error covariance P(ti) = E[e(ti)e T (t{)\ 
where e{ti) = x{ti) — x(ti) is the a posteriori error between 
the true state x(t{) and the a posteriori state estimate x(ti). 
First the state and error variance estimates are projected 
forward from time ji to time t % through the following 
equations: 

x(t ~ ) = Ax(ti-i ) + Bu(ti) (3) 

P(t~) = AP(ti-i)A T + Q ( 4 ) 

where tj indicates a priori values. An adaptive gain factor 
K minimizes (in the least-square sense) the error covari- 
ance. Noisy measurements of the process are then used 
to compute the a posteriori state estimate. Finally the a 
posteriori covariance estimate is computed. These three 

E. Benazera is with RIACS, S. Narasimhan is with QSS Group 
Inc. / NASA Ames Research Center,Moffett Field, CA, USA. ebe- 
nazer@email.arc.nasa.gov, sriram@email.arc.nasa.gov 


steps are summarized as: 

- K(U) = P(t-)H T (HP(t~)H T + R)- 1 (5) 

x(U) = x(t~) + K(U)(z(ti) - Hx{t~ )) (6) 

P(ti) — (I - K(ti)H)P(t~) (7) 

The KF has been the subject of extensive research and 
applications ([2]). 

B. Fault Detection and the Kalman Filter 

We argue that in several situations the KF is in cross- 
purposes with the fault detection. First, the KF is designed 
to filter any deviations in the measurements and predictions 
by using the measurement updates. As a result the mag- 
nitude of the residual e(ii) = zfe) — Hx(t~) is reduced, 
affecting the fault detection capability. Second, when the 
measurement noise is high the error covariance is so large 
that even a large residual falls well within its bounds. 
Furthermore, since the gain factor K is not dependent on the 
input matrix B y the covariance minimization is not affected 
by any faults on the input ([3]). 

II. Preliminaries 

A. n-step predictor 

We define the n-step predictor of the state x n (ti) to be the 
n-step open-loop estimate of the state. x n (ti) is computed 
recursively by taking the KF state estimate at time and 
then projecting it forward for n steps using equation (3). The 
covariance is also projected forward using equation (4). 

n 

x n (ti) = A n x(ti- n ) + A™- 1 Bu(ti_ n+j ) (8) 

j = i 

PnQk) = ■ / i n P(t i - n )(A^') n + £ A*Q(A T y (9) 

j - 0 

Let X n (ti) ~ N(x n (ti), P n (ti)) be the random variable 
corresponding to the n-step prediction. We define D n (ti) = 
X n (ti)-X(ti ), and D n (U) ~ Af{E[D n (U)],co v(Ai(*<))): 

E[D n (ti)] = dn{ti) = x n (ti) - x(ti) 

= Ad n -i(ti-i) + K(ti)(H(Bu(ti) 
+Ax(ti-i)) - z(ti)) ( 10 ) 

If n = 1, then = x(U-i) and di(U) = 

-K(ti)e(ti) as d 0 (ti) = 0. 



divergence). 

III. Fault Detection 


l/(27r nx / 2 |P(ti)| 1 / 2 ). Due to the potentially large 
variance 5 n (£*), L(z(ti) | X n (ti)) may not be sufficient 
for quick detection. 

The a posteriori n-steps prediction likelihood L(X n (ti) | 
X(U)) assesses the distance between X n (U) and X(U). We 
examine the Kullback-Leibler (KL) divergence [4] between 
X n {ti) and X(ti) which measures how different the two 
distributions are 1 . KL(X n (ti), X(U)) can be understood 
as the average number of bits that are wasted by encoding 
events from the predicted distribution (over n-steps) with 
a code based on the estimated distribution. Therefore, the 
less bits are wasted, the more it is likely the system 
behavior is nominal. We thus note L(X n (ti),X(ti)) = 
KL(C(P n (ti)) — X n (ti),X(ti)l. This number is typically 
infinite as the surface under C(P n (ti)) — fx n (U) is infinite 
(see figure 1). We thus study the number of wasted bits over 
99.7% of X’s variance instead to get a good approximation 
(through Monte-Carlo (MC) simulation). Based on this, the 
fault indicator (F) mirrors the nominal one: 


When the system is behaving nominally (without faults) 
we expect the measurements, estimates and the n-step 
prediction to be close to each other. We examine the quality 
of the produced state estimate given the 'measures and the 
open-loop estimate (with limited measure influence). The 
probability of the estimated state X ( ti ) at any time ti given 
the measurements z(t{) and the n-step prediction X n (ti) is 
given by: 


p{X{t % )\z(t i \X n (t i )) = 


p(z{ti),X n (ti)) 


~p(z(U) I X n (ti), XitiMXniti) I X(t t ))p(X(t t )) 

( 11 ) 


For each value of x, the higher the probability returned 
by this distribution, the higher we expect the system to be 
nominal. 


A. Likelihood Indicators 


L(F\z(U),X n (ti)) 

= (■ C(P n (ti )) - L(z(U ) | X n (UmKL(Xn(tilX(ti)))p F 

(13) 

where Pf = 1 — Pat. 


B. Fault decision 

Considering the two classes N and F and their respec- 
tive conditional likelihoods L(N- \ ;?(£*), X n (£*)), L(F | 
z(ti),X n (ti )), two decision functions are built, that dis- 
criminate between the two classes given z(ti) and X n (ti): 

9N(ti ) = log (L(z(ti) I X n (ti))) + L(x n (u) I X(U)) 
+ log (pat) 


and 

9F(ti) ± \Og{C{P n {ti)) - L(z(ti) | Xn(ti))) (14) 
+KL(X n (t i ),X(t i )) + log(p F ) 


To make decisions, we use a likelihood (£) indicator: 
L(N | z(ti), X n (ti)) 

= L(z(U) | X n (U))L(X n (U) I *(ti))PN (12) 

where L(z(ti) | X n (ti)) is the a priori n-steps measure- 
ment likelihood, L(X n (U) \ X(ti)) is the a posteriori n- 
steps prediction likelihood, and p^ is the fixed probability 
that no fault occurs at each time step. The a priori n-steps 
measurement likelihood L(z(ti) | X n (ti)) is based on the 
distance between z(ti) and z n {ti) — Hx n {ti) given the 
covariance S n (ti) — HP n {ti)H T . This distance is expected 
to be small under nominal behavior, and to increase when 
a fault occurs. We have: 

L(z(ti) | J i n (ti))~SS(Zn(ti),Sn(ti)) 

where fx(u)( x ) = C ( P (U)) exp(-|(z - 

x(ti)) T F- 1 (t i )(x - x(U))) and C(P(ti)) 


The overall decision function is then based on the sign of 
6 n (z(U), X n (ti),X(ti)) = g N (ti) - g F (ti) (15) 
and is given by: a fault occured if 

tf n (z(t i ),Xn(t i ),X'(t i ))< 0. 

C. Determining n dynamically 

One key factor in the effectiveness of our fault detector is 
the value for n. Here, we propose to dynamically adapt n. 
We study the changes in the decision line (15) as a result of 
unit change in n: this comes to comparing the decision lines 
for an n and n+l-step predictors. We note 5 n +i,n = ^n+i ~ 
8 n . This short paper precludes the writing of the complete 
developments, so we give the reader a brief outline of our 
methods: we study the deritative values of <5 n +i,n w -r.t. z(ti) 
and x n (U ), then the orientation of these two vectors of 
derivatives with respect to each other in the observation 
space: if they are negatively oriented, n stays unchanged, 

1 Note that it is not real distance, as it is not symmetric. 



otherwise the sign of ^ n +i,n decides for n increment. For 
this we need to project the derivative with respect to x n (h) 
to the observation space, 9 being the angle between the 
vectors in the observation space, we have: 

cos 9 = — . . uj . 

where ||.|j denotes the h norm. The adaptation strategy for 
n is then given by: 

it <9 <2ir n(t i+ 1 ) — n(ti) 

else if £ n +i,n > 0 n(ti+i ) = n{ti) - 1 (16) 

else J n +i,n < 0 n(t w ) = n(ti) + 1 
D. Implementation 

Algorithm 1 presents the filter loop at time step It is 
initialized with x (0) = x n (0) = x 0 , P( 0) = P n ( 0) = P 0 
and n = n min . The implementation requires storing or 
recomputing several values and matrices: e(ti - n ), K(ti ~ n ), 
P(t^_ n ). This is consistent with modem diagnosis engines 
that work on a fixed temporal window [5], although increas- 
ing the computational complexity of the K F. The following 
results help in mitigating the computational effort: 

~ dni^i) ~ ~A n K(ti- n )€(ti-. n ) ( 17 ) 
Pn+i(ti) - PniU) = A n K(t^ n )HP(t-_ n )(A n ) T ( 18 ) 
These relations appear on steps 2 and 3: 


l: Standard Kalman filter prediction and update. 

2: d n (ti) computation: 

d>n— l(fi— l) — d n (fj_i) 

+A n ~ 1 K(ti- n +i)e(ti- n +i) 
d>n(ti) = Ad n -i(ti„i) + K(ti)(HBu(ti) 
+HAx(ti -\) — 

3: n-steps prediction: 

~ d n (ti) H~ £(^i) 

Pu-l(fi-l) = Pniti-^-A^KiU^) 
HP(t-„ n+1 )(A n ~') T 
Pn(U) = AP n -.i(ti-i)A T + Q 

4: Fault detection: sign(5 n (z(ti), X n (U), X(U))). 

5: Adapt n according to 9 and 
X n (ti), X n +i(ti), X 

Algorithm 1: KF with 72 -steps open-loop and fault detection 
IV. Application 

The application that motivated this research was that of a 
hybrid diagnosis engine that combines a Rao-blackwellized 
particle filter (RBPF) [6] with a logical approach to the 
diagnosis [7]. An efficient fault detector was needed to 
articulate the tracking and the consistency-based engine for 



Fig. 2. Bottom graph: a simulated thermostat fails turning off around 
step 370. Middle graph: identified modes (percentages). The RBPF with 
embedded fault detector alarms on early mode changes and lowers particles 
weight that identify the wrong mode. Top graph: the percentage of alarming 
particles (over 100 particles). The bumps correspond to the system nominal 
and faulty mode changes. 

logical diagnosis, i.e. for deciding when to trigger the latter, 
or returning to the former. 

As a preliminary test, we plugged the fault detector into 
the RBPF and tracked a simulated noisy thermostat. The 
RBPF tracks multi-modal linear systems with Gaussian 
noise. The belief state is a mixture of Gaussians whose 
statistics are propagated with a KF. The particle weight is 
computed as the observation probability p{z(U) | JV(fT)). 
Our strategy uses the fault detector to assert the quality the 
estimate and lowers the weight of particles that are not in the 
correct mode. Figure 2 shows a run on a faulty thermostat 
(n < 50): the number of alarming particles rises at each 
mode change. Our version of the filter detects wrong modes 
and faults almost instantly. Identification however depends 
on the modes sampling. 

Unfortunately, on large multi-dimensional continuous 
spaces, the computational weight of the detector is very 
heavy due to the MC calls for the KL computation. 
Moreover, results are deceiving on systems with uncertain 
parameters (high process noise) and precise sensing (low 
observation noise). For these reasons, we are not using this 
detector in our current diagnosis engines. 
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